Skip to content

Adding functionality for unconstrained ligand minimization#15

Draft
delalamo wants to merge 29 commits intomainfrom
ligand-minimization
Draft

Adding functionality for unconstrained ligand minimization#15
delalamo wants to merge 29 commits intomainfrom
ligand-minimization

Conversation

@delalamo
Copy link
Copy Markdown
Owner

No description provided.

delalamo and others added 12 commits January 12, 2026 09:08
- Add Dockerfile using condaforge/miniforge3 for conda dependencies
- Add docker-build.yml workflow triggered by releases, tags, or manual dispatch
- Add .dockerignore to exclude build artifacts
- Update README with Docker usage instructions

Image published to ghcr.io/delalamo/graphrelax

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…mization

Detects missing residues in protein chains by checking residue numbering
discontinuities and C-N bond distances. Chains are split at gaps before
OpenMM minimization to prevent the creation of unrealistic peptide bonds
across gaps. Original chain IDs are restored after minimization.

- Add chain_gaps.py module with detect_chain_gaps, split_chains_at_gaps,
  and restore_chain_ids functions
- Integrate gap detection into relaxer.py relax() method
- Add split_chains_at_gaps config option (enabled by default)
- Add --no-split-gaps CLI flag to disable the feature
- Add comprehensive tests for chain gap detection
…atom

The bug was assigning a new chain ID for every atom at a gap start residue
instead of just once when entering a new segment. Added tracking of
processed gap starts to prevent duplicate chain assignments.
- Free up disk space by removing unused .NET, GHC, and Boost packages
- Install CPU-only PyTorch to avoid large CUDA dependencies
- Use --no-cache-dir to minimize pip cache usage

The GitHub Actions runner was running out of disk space when installing
PyTorch with CUDA dependencies (~5-7GB) alongside conda packages.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Enable unconstrained minimization for protein-ligand complexes using
openmmforcefields for small molecule parameterization.

Changes:
- Add include_ligands, ligand_forcefield, and ligand_smiles config options
- Create ligand_utils.py module for ligand extraction and parameterization
- Add _relax_unconstrained_with_ligands method using SystemGenerator
- Update CLI with --include-ligands, --ligand-forcefield, --ligand-smiles
- Add [ligands] optional dependency group to pyproject.toml
- Add unit tests for ligand utilities
- Add ligand_utils.py to pylint exclude (optional dependency imports)

The implementation:
1. Separates protein (ATOM) and ligands (HETATM) from PDB
2. Processes protein with pdbfixer (avoiding terminal detection issues)
3. Parameterizes ligands with OpenFF Sage 2.0 (default) or GAFF2/Espaloma
4. Combines topologies and minimizes together

Usage:
  graphrelax -i complex.pdb -o minimized.pdb --include-ligands
  graphrelax -i complex.pdb -o minimized.pdb --include-ligands \
    --ligand-forcefield gaff-2.11 --ligand-smiles 'LIG:c1ccccc1'

Requires: pip install graphrelax[ligands]

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…nary

Replace the large hardcoded SMILES dictionary with dynamic extraction
from PDB coordinates using RDKit bond perception:

- Rename get_common_ligand_smiles() to get_ion_smiles() (only ions need
  explicit SMILES since they're single atoms without bond info)
- Add is_single_atom_ligand() helper function
- Update create_openff_molecule() to try RDKit parsing first, then fall
  back to OpenFF PDB parsing, using ion lookup only for single atoms
- Update relaxer.py to use the new approach
- Update tests to reflect the refactored functions

This removes ~70 lines of hardcoded SMILES while making the code more
robust - it can now handle any ligand that RDKit can perceive bonds for.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The ligand libraries (openmmforcefields, openff-toolkit, rdkit) are
only available via conda-forge, not PyPI. Instead of lazy imports,
use top-level try-except blocks with clear ImportError messages that
tell users exactly how to install the missing dependencies:

- ligand_utils.py: Check for openff-toolkit and rdkit at import time
- relaxer.py: Check for openmmforcefields at import time
- Both provide clear conda install commands with version numbers
- pyproject.toml: Updated comment with full conda install command
- cli.py: Removed optional dependency check (now handled at runtime)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
delalamo and others added 16 commits January 13, 2026 10:32
- Add artifacts.py with comprehensive artifact detection for buffers,
  cryoprotectants, detergents, lipids, reducing agents, and halide ions
- Auto-remove artifacts by default, preserve biologically relevant ions
- Add --keep-all-ligands and --keep-ligand flags to whitelist residues
- Update README with mamba/micromamba installation instructions
- Add tests for artifact detection and removal

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Resolved conflicts:
- README.md: Combined mamba installation docs with pre-idealize feature
- cli.py: Added both artifact removal flags and pre-idealize flags
- relaxer.py: Combined ligand_utils and idealize imports

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove action="append" in favor of single comma-separated string
- Fix parsing logic to handle single string instead of list
- Add unit tests for --keep-ligand CLI argument parsing

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code cleanup:
- Consolidate WATER_RESIDUES to artifacts.py (was defined in 3 places)
- Create shared check_gpu_available() in utils.py (was duplicated)
- Remove unused add_ter_records_at_gaps() from chain_gaps.py
- Remove unused _relax_direct() method from relaxer.py (~100 lines)
- Remove corresponding test class TestAddTerRecordsAtGaps

README update:
- Update --keep-ligand documentation to show comma-separated syntax

Total: ~180 lines of redundant code removed.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove requirements.txt (dependencies in pyproject.toml)
- Update test_relaxer_integration.py to use public API:
  - Use check_gpu_available() from utils instead of removed method
  - Use relax() with constrained=False instead of removed _relax_direct()

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Make it clear that openmmforcefields, openff-toolkit, and rdkit
  are conda-forge only (like pdbfixer)
- Add ligand support installation command to PyPI section
- Reorganize dependencies table to show required vs optional

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- pdbfixer, openmmforcefields, openff-toolkit, and rdkit are now all
  required for installation
- Simplified installation instructions
- Removed "optional" labeling from dependencies table

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change idealization from opt-in (--pre-idealize) to opt-out (--no-idealize)
- IdealizeConfig.enabled now defaults to True
- Add --ignore-missing-residues flag to skip adding residues from SEQRES
- Add --overwrite flag to allow overwriting output files
- Preserve residue numbering with keepIds=True in PDB output
- Update README to reflect new default behavior

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add renumber_residues_sequential() to idealize.py to ensure sequential
  residue numbering (1, 2, 3...) per chain after pdbfixer adds missing
  residues. This fixes false chain gap detection caused by non-sequential
  numbers from pdbfixer.
- Add CLI warning when using resfile with idealization enabled, since
  residue numbers in the resfile must match the idealized structure
- Update README to document the residue renumbering behavior

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix bug where ligands were extracted before the ligand-presence check,
causing the ligand-aware minimization path to never be triggered.

Now for unconstrained minimization with ligands:
- Ligands are detected before any extraction
- Full PDB (with ligands) is passed to _relax_unconstrained()
- Ligands are parameterized via openmmforcefields and minimized
  together with the protein

For constrained minimization, ligands are still extracted and restored
unchanged since AmberRelaxation cannot handle arbitrary ligands.

This fixes protein-ligand clashes that occurred when the protein moved
during minimization while ligands stayed in their original positions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update idealization pipeline to properly handle ligands:

1. Add missing residues and atoms (protein only)
2. Idealize bond lengths and angles
3. Minimize protein with constraints (without ligands)
4. Reintroduce ligands and minimize protein+ligand complex together

This ensures ligands move with the protein during idealization rather
than staying at their original coordinates while the protein moves.

New function minimize_with_ligands() uses openmmforcefields to
parameterize ligands and perform constrained minimization on the
full protein-ligand complex.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When ligands containing transition metals (heme, Fe-S clusters,
chlorophylls, etc.) are detected, raise a clear error explaining
the options instead of attempting to parameterize them.

- Add UNPARAMETERIZABLE_COFACTORS set in ligand_utils.py
- Add is_unparameterizable_cofactor() check function
- Update relaxer and idealize to fail early with helpful message

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Don't silently catch and continue when ligand parameterization fails
during idealization - let it fail with a clear error message instead
of failing later during relaxation.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
If create_openff_molecule() fails, let the error propagate rather
than silently skipping the ligand and then failing later.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tion

- Remove openmmforcefields dependency for ligand handling
- Add ligand exclusion zone approach in relaxer.py: ligand atoms are
  added as massless dummy particles with LJ repulsion, preventing
  protein from moving into ligand space during minimization
- Simplify idealize.py: ligands are extracted before protein
  minimization and restored afterward at original positions
- Fix PyTorch deprecation warning in tensor_utils.py (use tuple for
  multidimensional indexing)
- Add PDBe SMILES fetching for ligand identification (ligand_utils.py)
- Remove complex ligand parameterization code that was failing due to
  PDB files lacking bond information for HETATM records

This approach is more robust because:
1. No need for ligand force field parameters
2. Works with any ligand without SMILES
3. Protein minimizes fully while respecting ligand positions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@delalamo delalamo marked this pull request as draft January 22, 2026 12:22
@delalamo
Copy link
Copy Markdown
Owner Author

What a god damn mess. I need to stop vibe coding and go outside

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove common xtal/em reagents from structures by default Ligands can't be minimized without constraints due to incompatibilities with force field

2 participants